Modelling potential environmental impacts of science activity in Antarctica
Abstract
We use GPS data collected on a science expedition in Antarctica to estimate hiking functions for the speed at which humans traverse terrain differentiated by slope and by ground cover (moraines and rock). We use the estimated hiking functions to build weighted directed graphs as a representation of specific environments in Antarctica. From these we estimate using a variety of graph metrics—particularly betweenness centrality—the relative potential for human environmental impacts arising from scientific activities in those environments. We also suggest a simple approach to planning science expeditions that might allow for reduced impacts in these environments.
1 Introduction
Antarctica is often considered a vast, desolate continent devoid of human activity. However, there are small, scientifically significant areas where substantial, regular human activity occurs. Globally unique ice-free areas within Antarctica, such as the McMurdo Dry Valleys, are of particular scientific interest and so a significant amount of activity occurs in this specially managed area. Human activity in Antarctica is highly dependent on the infrastructure that supports people to survive in this environment. Outside of a reduced period of activity caused by COVID-19, human activity on the continent continues to grow (Tejedo et al. 2024) and so the actual, and future potential impact of these activities continues to grow. The McMurdo Dry Valleys are close to the largest research base on the continent, McMurdo Station, and the pressures on this unique environment from human activity are increasing. Understanding the scope and implications of human activity in key parts of Antarctica is critical to supporting environmental management of the continent.
The history of understanding the role and impacts of human activity in Antarctica starts with the Sixth Antarctic Treaty Consultative Meeting (ATCM) in 1970. A recommendation of the final report of this meeting was for member countries to encourage research on how human activity impacts Antarctica and its ecosystems (ATCM 1970 Recommendation VI-4). By the mid-1980s, it was recognised that scientific activity, particularly the logistics required to support that activity, required additional scrutiny, and also that scientific activity could pose a greater threat to Antarctica’s ecosystems than tourism (Benninghoff and Bonner 1985). At this point, the need for more understanding and the ability to forecast the environmental impact of human activity was recognised (ATCM 1985, para. 46).
Moving into the mid-1990s, the multi-disciplinary scope of large programmes of research such as the McMurdo Dry Valleys Long Term Ecological Research (MCM LTER) programme, increased the number of researchers in the field while also supporting more research into the history of human activity in the Dry Valleys (Priscu and Howkins 2016). With the establishment of the Committee for Environmental Protection (CEP) in 1998, human activities (including science activities) were to be monitored, but attention moved away from the understanding and scope of the activities, and became more focused on potential resulting impacts of those activities. For example, the current 5-year Work Plan for the CEP (REF), has several items associated with human activities but these are focused on managing the impact of human activities rather than quantifying the current level of activity, or forecasting cumulative impacts of the activity.
More recent research has focused either on a continental-scale picture of human activity (Leihy et al. 2020), or detailed mapping of the infrastructural footprint and impacts that the numerous Antarctic stations have on the continent (Brooks et al. 2018; Brooks et al. 2019). These approaches tackle the challenge of understanding human activity from contrasting spatial perspectives. The present research aims to complement these two techniques, by using high resolution mapping of human activity in the dry valleys to extrapolate how scientists navigate in these landscapes, or could potentially navigate in these landscapes.
In the next section we outline our approach and discuss related work. We then describe data sources, both general environmental and topographic, and the detailed GPS data from which our findings are derived. Procedures for deriving hiking functions and applying them to build hiking networks as representations of the dry valleys landscapes are explained, followed by examples of analyses possible using this approach, particularly the application of measures of graph betweenness centrality as proxies for potential impact. We also outline by which these methods might be used to plan for reduced impact from scientific expeditions to these environments. The paper concludes with a discussion of our findings, prospects for further work, and some general conclusions.
3 Data sources
3.1 Antarctic geospatial data
Geospatial data for Antarctica were obtained from sources referenced in Cox et al. (2023b), Cox et al. (2023a), and Felden et al. (2023). The study area was defined to be the Skelton and Dry Valleys basins, as defined by the NASA Making Earth System Data Records for Use in Research Environments (MEaSUREs) project (Mouginot and University Of California Irvine 2017) and shown in Figure 3(a). The Skelton basin was included because while the expedition GPS data was ostensibly collected in the McMurdo Dry Valleys, it actually extends into that basin as shown in Figure 3(b). Elevation data from the Reference Elevation Model of Antarctica (REMA) project Howat et al. (2022), and geology from GeoMAP Cox et al. (2023b) are shown in Figure 3(c). The five largest areas of contiguous non-ice surface geology across the study area shown in Figure 3(d) were chosen to be the specific sites for more detailed exploration using the methods set out in this paper. These range in size from around 320 to 2600 square kilometres.
3.2 GPS data from an expedition
GPS data recording how scientists moved around the McMurdo Dry Valleys was collected using QStarz Q1100P GPS Tracking Recorders between 2016 and 2018 (see Figure 4). These units were selected for their long battery life (40+ hours), chipset sensitivity, and simple design and use.
Each unit was set to record a location every 30 seconds. Scientists turned on the devices when leaving base camp in the morning and turned them off when returning at the end of the day. The units were attached to the top of a backpack to ensure units had good access to the GPS signal.
Participants were sourced through existing networks of scientists supported by the New Zealand Antarctic Programme who were going to Antarctica for research purposes. All participants were asked if they wished to carry a GPS recorder. The study was entirely voluntary. Over the duration of the two primary field events, 40 people used the devices. Social Ethics approval was granted by the Manaaki Whenua Landcare Research Social Ethics committee (SE 1617/05).
Sample output of the GPS data is shown in in Figure 5.
It was necessary to process GPS data to make them better suited for use in the estimation of hiking functions. The first processing step was to confirm the plausibility of the data, particularly the device-generated speed distance between fixes, and elevations associated with fixes. The challenges of post-processing GPS data are well documented and relate to issues with GPS drift which can lead to estimated non-zero movement speeds as a result of noise in the signal. The raw GPS data included distance since last fix, speed, and elevation estimates and it was determined in all cases that the device generated results for these measurements were likely to be more reliable than post-processing the raw latitude-longitude fixes to calculate the values.
The second processing step was to remove fixes associated with faster movement on other modes of transport than walking. Wood et al. (2023) cite a number of previous works that base detection of trip segments based on recorded speeds. This method was trivially applicable to our data to a limited degree as scientists arrive at the expedition base camp and occasionally also travel on helicopters on trips to more remote experimental sites.
The third, more challenging processing step was to deal with sequences of fixes associated with non-purposeful movement when scientists were in or around base camp, at rest stops, or at experimental sites. Crude filters removed fixes with low recorded distances between fixes (less than 2.5 metres), high turn angles at the fix location (greater than 150°), and fixes recorded on ice-covered terrain, but this didi not clean the data sufficiently for further analysis. An additional filtering step was to count fixes (across all scientists) in square grid cells and remove all fixes in grid cells with more than 50 fixes.
This left one persistent concern: an over-representation of consecutive fixes recorded at exactly the same elevation, resulting in many fixes with estimated slopes of exactly 0, and leading to a clearly evident dip in estimated movement speeds at 0 slope (Figure 6(a)). It is likely that these fixes are associated with GPS device drift, so a it was decided to remove all fixes where estimated slope was exactly 0. Figure 6(b) shows the improvement in even a crudely estimated hiking function derived from local scatterplot (LOESS) smoothing. Note that such functions are likely overfitted and not used further in our analysis where we favour more easily parameterised functions such as those discussed in Section 2.1.
4 Methods and results
4.1 Hiking functions
We fit three alternative functional forms to the cleaned GPS data: exponential (Tobler 1993), Gaussian (following Irmischer and Clarke 2018), and Lorentz (following Campbell et al. 2019) using the Levenburg-Marquardt algorithm (Moré 1978) as provided by the nlsLM function in the minpack.lm R package (Elzhov et al. 2022). The raw data and fitted curves are shown in Figure 7.
The Lorentz function offers a marginal improvement in the model fit in comparison with the Gaussian function, while both are clearly better than the exponential form. However, the improvement offered by the Lorentz function over the Gaussian is marginal: residual standard error 1.489 vs 1.491 on Moraine, and 1.487 vs 1.488 on Rock, and inspection of the curves shows that estimated hiking speeds for the Gaussian functions are much closer to a plausible zero on very steep slopes. We therefore chose to adopt Gaussian hiking functions for the remainder of the present work.
In previous work researchers have applied a ground cover penalty cost to a base hiking function to estimate traversal times. We instead, as shown, estimate different hiking functions for the two ground cover types present. The peak speed on rock is attained on steeper downhill slopes than on moraines, perhaps indicative of the greater care required on downhill gravel slopes. Meanwhile the highest speeds on level terrain are attained on moraines.
Plotting both hiking functions along with an additional model fitted to all the data on the same axes confirms that the fitted functions are sufficiently different to retain separate models for each ground cover (see Figure 8). Plotting both functions in the same graph makes clearer the difference in maximum speed and slope at maximum speed associated with each ground cover.
The estimated hiking functions associated with the two land covers are \[ \begin{array}{rcl} v_{\mathrm{moraine}} & = & 4.17\,\exp\left[{-\frac{(\theta+0.0526)^2}{0.236}}\right] \\ v_{\mathrm{rock}} & = & 3.76\,\exp\left[{-\frac{(\theta+0.119)^2}{0.365}}\right] \end{array} \tag{3}\]
where the different maximum speeds (in kilometres per hour) and slopes of maximum speed are apparent.
4.2 Landscapes as graphs
We developed R code (R Core Team 2025) to build graphs (i.e. networks) with hexagonal lattice structure and estimated traversal times for graph edges derived from our hiking functions. Graphs are manipulated as igraph package (Csárdi and Nepusz 2006; Csárdi et al. 2024) graph objects for further analysis. An important decision in constructing graphs is choice of the spacing of the hexagonal lattice, and also of the underlying DEM from which graph vertex elevations are derived. Given the extent of the study sites (see Figure 3(d)) it was decided that a hexagonal lattice (see Figure 2) with hexagons equivalent in area to 100 metre square cells as appropriate. The hexagon centre to centre spacing of this lattice is given by \[
100\sqrt{\frac{2}{\sqrt{3}}}\approxeq 107.5\,\mathrm{metres}
\tag{4}\] Given this lattice resolution we interpolated vertex elevations from a 32m resolution DEM from the REMA project (Howat et al. 2022) by bilinear interpolation using the R terra package (Hijmans 2024). It would be straightforward to derive vertex elevations from a more detailed DEM if required.
Edge weights (i.e. estimated traversal costs) are assigned by calculating the slope of each directed edge, the estimated hiking speed for that slope, and thus finding how long it should take for an edge to be traversed. If the estimated traversal time of an edge in the nominal 100 metre lattice is greater than 30 minutes then it is removed from the graph, along with its ‘twin’ edge in the opposite direction. Removing edges in both directions is partly a practical matter as it simplifies the operation of many graph algorithms, but can also be justified on the basis that a slope steep enough to be a barrier to ascent is unlikely to be traversed when descending.
After removal of all such edges only the largest connected graph component is retained so that the resulting hiking network representation is fully connected with no isolated vertices unreachable from elsewhere in the network remaining. A map of the fifth largest study area’s hiking network is shown in Figure 9. This network includes 30,697 vertices and 174,798 directed edges. The largest of the five study sites (see Figure 3(d)) results in a network containing almost a quarter of a million vertices and over 1.4 million directed edges. A hiking network can be used to explore many connectivity properties of the environment. For example, for a chosen origin point, a shortest path tree can be derived showing the route (Figure 9(c)).
4.3 Betweenness centrality
The connectivity properties of a hiking graph can be described using a range of measures of graph structure and used to reveal the relative likelihood of different parts of a terrain being frequently traversed. The particular structural measure we focus on is betweenness centrality, which is explained below.
In a graph \(G=(V,E)\) a path \(P\) is an ordered sequence of vertices, \(P=\left(v_1,v_2,\ldots,v_n\right)\) such that each consecutive pair of vertices \(v_i\) and \(v_{i+1}\) is adjacent, i.e. connected by a directed edge \(e_{i,j}\). The length of a path is the number of edges it contains. The weighted length or cost of a path is the sum over its edges of a weight or cost associated with each edge, which we here denote \(w_{i,j}\), i.e., the length \(L\) of a path \(P\) is given by \[ L(P)=\sum_{i=1}^{n-1} w_{i,i+1} \tag{5}\]
The shortest path from \(v_i\) to \(v_j\) is the path \(P=\left(v_i,\dots,v_*,\ldots,v_j\right)\) starting at\(v_i\) and ending at \(v_i\) passing through intervening vertices \(\left(v_*\right)\) such that \(L(P)\) is minimised. In a regular lattice such as our hiking networks there are many shortest paths of equal length if the weight associated with each edge is equal, as it would be if we based it solely on the length of the edge between each pair of vertices. When we consider edge traversal costs derived from the slope of each edge and a hiking function, then the shortest path will be unique, or one of only a small number of possibilities of equal total cost.
Shortest paths are used to develop many measures of vertex centrality in graphs (Freeman 1978). One such measure of vertex centrality is betweenness centrality. The betweenness centrality of a vertex is the total number of times that vertex is on shortest paths between every other pair of vertices in the network. If we denote the number of shortest paths or geodesics between two vertices \(v_j\) and \(v_k\) by \(g_{jk}\), then each appearance of a vertex \(v_i\) on the shortest path between those vertices only contributes \(1/g_{jk}\) to the betweenness centrality of \(v_i\). Formally, if we denote the number of times vertex \(v_i\) appears on shortest paths between \(v_j\) and \(v_k\) by \(g_{jk}(v_i)\), then its betweenness \(b_i\) is given by \[ b_i=\sum_{k=1}^n\sum_{j=1}^n\frac{g_{jk}(v_i)}{g_{jk}}\forall i\neq j\neq k \tag{6}\] Betweenness centrality is particularly relevant to applications where we are interested in how important vertices are to movement across the graph. An edge betweenness centrality measure can also be calculated on similar principles, but is not considered further here, in part because visualization is challenging given the potential for different scores for the two directed edges between each pair of vertices. Vertex betweenness on the other hand yields a single value for each vertex.
As might be expected betweenness centralities are computationally demanding to calculate. The time complexity of early algorithms was \(\mathcal{O}(n^3)\), where \(n\) is the number of vertices in the graph (Brandes 2001). An implementation of Brandes’s algorithm (2001) with time complexity \(\mathcal{O}(nm + n^2\log n)\), where \(m\) is the number of edges in the graph, is provided in the igraph package (Csárdi et al. 2024). Even with this improvement in performance, in our application such computational complexity is a strong motivation for working at a nominal 100 metre resolution. Halving the resolution to 50 metres would increase the number of graph vertices four-fold, and lead to a substantial increases in the times taken to calculate betweenness centralities.
Results of betweenness centrality for our example hiking network are shown in Figure 10. ‘All-paths’ betweenness centralities can be standardised relative to a theoretical maximum of \((n-1)(n-2)\), no straightforward standardisation is possible for the radius-limited case. Since we are primarily interested in betweenness as a measure of the relative vulnerability of different locations to human impacts, we linearly rescaled betweenness scores in all cases with respect to the range of scores across the graph being analysed. The rescaled betweenness, \(b_i^\prime\) for vertex \(v_i\) is given by \[ b_i^\prime=\frac{b_i-b_{\min}}{b_{\max}-b_{\min}} \tag{7}\] This approach is also applicable to radius-limited betweenness scores (see Section 4.4).
When all-paths betweenness is calculated, bottlenecks between large sub-areas are strongly highlighted as is the case for the ‘inner bend’ of the relatively narrow neck of navigable terrain that connects the east and west sub-areas of this site. The other feature of this map is the identification of a distinctive structure of ‘arterial’ routes.
4.4 Betweenness centrality limited by radius
The igraph implementation of betweenness centrality provides an option to radius limit betweenness calculations, meaning that only paths shorter than a specified radius (expressed in cost units, here of time) are considered in counting the appearance of vertices on shortest paths. This approach is particularly useful for large graphs where the time complexity of calculating radius limited betweenness scales more favourably than cited above, because the number of shortest paths that must be identified is greatly reduced. We informally confirmed the intuition that for a given radius limit the time take to compute betweenness scales approximately linearly with the number of vertices in the graph, since the number of shortest paths local to each vertex, and on which it might be found is similar.
Results for a series of radius limits are shown in Figure 11. The results here are interesting. At very short radii (the 30 minutes case) many locally important paths are identified as potentially relatively heavily trafficked. As the radius increases the pattern emphasizes potential paths that are important across wider areas, eventually tending toward the arterial structure of the all-paths case. We discuss these results more fully in Section 5.
4.5 Impact minimizing networks
In this section we explore developing the betweenness centrality for a limited set of locations, which might represent a base camp, rest sites, and experimental sites for a season’s expedition.
This is relatively straightforward and like radius limited betweenness involves limiting the shortest paths considered to only those among a chosen subset of vertices in the graph. One slight simplification we make for performance reasons is to weight every appearance of a vertex on a geodesic as equal, even if multiple equal length shortest paths exist between two vertices. In practice, this is unlikely to have much effect on the resulting estimates of relative betweenness, since with real-valued edge costs two paths of exactly equal cost are unlikely. An example result is shown in Figure 12. One refinement shown in the figure is that sites have been weighted by relative importance so that a site such as an expedition base camp can be expected to originate more trips than other sites.
A further analytical possibility is to use the hiking network to develop a set of routes among the sites that might minimise total impact, by confining movement to a limited set of paths. This approach is consistent with work in ecology and park and wildlife management which suggests that path networks can reduce the impact of human movement in many environments (Cole 1995a; Cole 1995b; Kidd et al. 2015; Marion et al. 2016; Piscová et al. 2023; Tomczyk and Ewertowski 2023).
The approach taken is first to first determine the lengths of all the shortest paths among all the sites. Then a minimum spanning tree for the sites is calculated, which is a graph with minimum total edge length that connects all the sites (Prim 1957). One refinement here, which we do not account for is that algorithms for finding the minimum spanning tree of a directed graph (referred to as a minimum-cost arborescence or optimal branching) are more complex (Korte and Vygen 2018), and not at present supported in igraph. While there is some asymmetry in paths between locations depending on the direction of travel, it is not extreme, and so we approximate a solution by finding the minimum spanning tree links (i.e. direct connections) among sites treating our network as undirected, and then determine paths over the ground to make the identified connections between sites based on the directed hiking network. The resulting path network for the same set of sites as in Figure 12 is shown in Figure 13.
5 Discussion
- Limitations of the method:
- Data concerns, esp. given improvements in GPS data collection since 2016/18
- Would higher resolution network be useful? (we think probably not)
- Possible limitations of a regularly spaced lattice: extend to a lattice based on e.g., TIN?
- Reliance on shortest paths as a guide to behaviour
- Some minor issues regarding the difference between directed and undirected graphs (to check using
networkx)
- Relevance of radius limits on betweenness: the implicit model of movement in this approach (short horizons, no existing paths, ‘keep going for half an hour in this general direction…’)
- Question of embedding paths, i.e., locking in more damage to some parts of the landscape vs others (this leads to the question in conclusion re method for less damaging plans)
6 Conclusions
To cover:
- Novelty of fitting land cover specific hiking functions
- A method for exploring potential impacts in a novel environment
- A method for proposing less damaging plans for expeditions
- Other?